This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Ctrl+Alt+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Ctrl+Shift+K to preview the HTML file).

colnames(Fang_genotypes) <- Fang_genotypes[,1]
Error in `colnames<-`(`*tmp*`, value = c(441L, 446L, 447L, 443L, 444L,  : 
  'names' attribute [2783] must be the same length as the vector [986]

First set of instructions: Describe file structures and their dimensions (file size, number of columns, number of lines, ect…). You don’t have to limit yourselves to the functions we learned in class.

As a reminder, the files are: • fang_et_al_genotypes.txt: a published SNP data set including maize, teosinte (i.e., wild maize), and Tripsacum (a close outgroup to the genus Zea) individuals • snp_position.txt: an additional data file that includes the SNP id (first column), chromosome location (third column), nucleotide location (fourth column) and other information for the SNPs genotyped in the fang_et_al_genotypes.txt file

Fang_genotypes

View(Fang_genotypes)

Add in name to all columns in fang et all genotypes

names(Fang_Genotypes)[1]<-paste("SNP_ID")
names(Fang_genotypes)[2]<-paste("Gene")
names(Fang_genotypes)[3]<-paste("Group")

snp_names <- data.frame(Fang_genotypes[,1])

extract maize genes for ZMMIL, ZMMLR, and ZMMMR and teos genes for ZMPBAL, ZMPIL, ZMPJA

dim(maize_genes)
[1] 1546  986

Now we will manipulate the two files to join (these data sets so that we have both genotypes and positions formatted such that the first column is “SNP_ID”, the second column is “Chromosome”, the third column is “Position”, and subsequent columns are genotype data from either maize or teosinte individuals.

Extract out the genes from the joined file that we want for maize

Maize_genes_to_extract1 <- c('ZMMIL','ZMMLR', 'ZMMMR')

extract teosinte genes for ZMPBA, ZMPIL , ZMPJA


head(joined_genos)
Teos_genes_to_extract2 <- c('ZMPBA','ZMPIL','ZMPJA')  
teo_genes <- 
Genotype_subset_teos <- subset(Transposed_Genotypes, grep(Teos_genes_to_extract2))
Error in grep(Teos_genes_to_extract2) : 
  argument "x" is missing, with no default
LS0tDQp0aXRsZTogIlIgTm90ZWJvb2sgZm9yIEJDQiBBc3NpZ25tZW50IDIiDQpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sNCi0tLQ0KDQpUaGlzIGlzIGFuIFtSIE1hcmtkb3duXShodHRwOi8vcm1hcmtkb3duLnJzdHVkaW8uY29tKSBOb3RlYm9vay4gV2hlbiB5b3UgZXhlY3V0ZSBjb2RlIHdpdGhpbiB0aGUgbm90ZWJvb2ssIHRoZSByZXN1bHRzIGFwcGVhciBiZW5lYXRoIHRoZSBjb2RlLiANCg0KQWRkIGEgbmV3IGNodW5rIGJ5IGNsaWNraW5nIHRoZSAqSW5zZXJ0IENodW5rKiBidXR0b24gb24gdGhlIHRvb2xiYXIgb3IgYnkgcHJlc3NpbmcgKkN0cmwrQWx0K0kqLg0KDQpXaGVuIHlvdSBzYXZlIHRoZSBub3RlYm9vaywgYW4gSFRNTCBmaWxlIGNvbnRhaW5pbmcgdGhlIGNvZGUgYW5kIG91dHB1dCB3aWxsIGJlIHNhdmVkIGFsb25nc2lkZSBpdCAoY2xpY2sgdGhlICpQcmV2aWV3KiBidXR0b24gb3IgcHJlc3MgKkN0cmwrU2hpZnQrSyogdG8gcHJldmlldyB0aGUgSFRNTCBmaWxlKS4NCg0KDQpgYGB7cn0NCmdldHdkKCkgI1dkIHdhcyBDOi9Vc2Vycy9Kb2UvRGVza3RvcC9SX0Fzc2lnbm1lbnQNCnNldHdkKCkgI0NoYW5nZSB3ZCB0byBDOi9Vc2Vycy9Kb2UvRGVza3RvcC9SX0Fzc2lnbm1lbnQNCnJlYWQudGFibGUoInNucF9wb3NpdGlvbi50eHQiLCBmaWxsID0gVFJVRSkgLT4gU25wbGlzdCAjIGNyZWF0ZSBhIHZhcmlhYmxlIHRpdGxlZCBTbnBsaXN0IGZvciBvdXIgZGF0YQ0KcmVhZC50YWJsZSgiZmFuZ19ldF9hbF9nZW5vdHlwZXMudHh0IiwgZmlsbCA9IFRSVUUpIC0+IEZhbmdfZ2Vub3R5cGVzICMgY3JlYXRlIGEgdmFyaWFibGUgdGl0bGVkIEZhbmdfZ2Vub3R5cGVzDQpTbnBsaXN0DQpGYW5nX2dlbm90eXBlcw0KY29sbmFtZXMoRmFuZ19nZW5vdHlwZXMpIDwtIHNucF9uYW1lcw0KI05lZWQgdG8gZXh0cmFjdCBvdXQgdGhlIHNhbWUgdHlwZXMgb2YgZGF0YSBmb3IgY2VydGFpbiBnZW5lcyBsaWtlIGxhc3QgdGltZSBhbmQgY29tYmluZSB0aGVtIHRvZ2V0aGVyLg0KDQpgYGANCg0KRmlyc3Qgc2V0IG9mIGluc3RydWN0aW9uczogDQpEZXNjcmliZSBmaWxlIHN0cnVjdHVyZXMgYW5kIHRoZWlyIGRpbWVuc2lvbnMgKGZpbGUgc2l6ZSwgbnVtYmVyIG9mIGNvbHVtbnMsIG51bWJlciBvZiBsaW5lcywgZWN0Li4uKS4gWW91IGRvbid0IGhhdmUgdG8gbGltaXQgeW91cnNlbHZlcyB0byB0aGUgZnVuY3Rpb25zIHdlIGxlYXJuZWQgaW4gY2xhc3MuDQoNCkFzIGEgcmVtaW5kZXIsIHRoZSBmaWxlcyBhcmU6DQrigKIJYGZhbmdfZXRfYWxfZ2Vub3R5cGVzLnR4dGA6IGEgcHVibGlzaGVkIFNOUCBkYXRhIHNldCBpbmNsdWRpbmcgbWFpemUsIHRlb3NpbnRlIChpLmUuLCB3aWxkIG1haXplKSwgYW5kIFRyaXBzYWN1bSAoYSBjbG9zZSBvdXRncm91cCB0byB0aGUgZ2VudXMgX1plYV8pIGluZGl2aWR1YWxzDQrigKIJYHNucF9wb3NpdGlvbi50eHRgOiBhbiBhZGRpdGlvbmFsIGRhdGEgZmlsZSB0aGF0IGluY2x1ZGVzIHRoZSBTTlAgaWQgKGZpcnN0IGNvbHVtbiksIGNocm9tb3NvbWUgbG9jYXRpb24gKHRoaXJkIGNvbHVtbiksIG51Y2xlb3RpZGUgbG9jYXRpb24gKGZvdXJ0aCBjb2x1bW4pIGFuZCBvdGhlciBpbmZvcm1hdGlvbiBmb3IgdGhlIFNOUHMgZ2Vub3R5cGVkIGluIHRoZSBgZmFuZ19ldF9hbF9nZW5vdHlwZXMudHh0YCBmaWxlDQoNCg0KYGBge3J9DQoNCiMgZmlsZSBzaXplDQojICMgb2YgY29sdW1ucw0KIyAjIG9mIGxpbmVzIA0KYXR0cmlidXRlcyhTbnBsaXN0KQ0KbmFtZXMoU25wbGlzdCkNCmRpbShTbnBsaXN0KQ0KY2xhc3MoU25wbGlzdCkNCnR5cGVvZihTbnBsaXN0KSAjIFR5cGUgb2Ygb2JqZWN0DQphdHRyaWJ1dGVzKFNucGxpc3QpICNhZGRpdGlvbmFsIGFyYml0cmFyeSBtZXRhZGF0YQ0KbGVuZ3RoKFNucGxpc3QpICMgaG93IG1hbnkgZWxlbWVudHMgaXQgY29udGFpbnMNCg0KYXR0cmlidXRlcyhGYW5nX2dlbm90eXBlcykNCm5hbWVzKEZhbmdfZ2Vub3R5cGVzKQ0KZGltKEZhbmdfZ2Vub3R5cGVzKQ0KY2xhc3MoRmFuZ19nZW5vdHlwZXMpDQp0eXBlb2YoRmFuZ19nZW5vdHlwZXMpICMgVHlwZSBvZiBvYmplY3QNCmF0dHJpYnV0ZXMoRmFuZ19nZW5vdHlwZXMpICNhZGRpdGlvbmFsIGFyYml0cmFyeSBtZXRhZGF0YQ0KbGVuZ3RoKEZhbmdfZ2Vub3R5cGVzKSAjIGhvdyBtYW55IGVsZW1lbnRzIGl0IGNvbnRhaW5zDQoNCmxpYnJhcnkoZGF0YS50YWJsZSkNCmxpYnJhcnkoInRpYmJsZSIpDQojIGdldCBjb2x1bW4gbmFtZXMNCmRhdGEuZnJhbWUoY29sbmFtZXMoRmFuZ19nZW5vdHlwZXMpKSAtPiBGYW5nX2NvbHVtbl9OYW1lcw0KRmFuZ19jb2x1bW5fTmFtZXMNCkZhbmdfbmV3X0NvbHVtbl9OYW1lcyA8LSBkYXRhLmZyYW1lKEZhbmdfZ2Vub3R5cGVzWzEsXSkNCg0KbmFtZXMoRmFuZ19nZW5vdHlwZXMpIDwtIEZhbmdfbmV3X0NvbHVtbl9OYW1lcw0KRmFuZ19nZW5vdHlwZXMNCg0KZGltKEZhbmdfY29sdW1uX05hbWVzKQ0KZGltKEZhbmdfbmV3X0NvbHVtbl9OYW1lcykNCmBgYA0KQWRkIGluIG5hbWUgdG8gYWxsIGNvbHVtbnMgaW4gZmFuZyBldCBhbGwgZ2Vub3R5cGVzDQoNCmBgYHtyfQ0KbmFtZXMoRmFuZ19HZW5vdHlwZXMpWzFdPC1wYXN0ZSgiU05QX0lEIikNCm5hbWVzKEZhbmdfZ2Vub3R5cGVzKVsyXTwtcGFzdGUoIkdlbmUiKQ0KbmFtZXMoRmFuZ19nZW5vdHlwZXMpWzNdPC1wYXN0ZSgiR3JvdXAiKQ0KDQpzbnBfbmFtZXMgPC0gZGF0YS5mcmFtZShGYW5nX2dlbm90eXBlc1ssMV0pDQoNCg0KYGBgDQoNCg0KDQoNCg0KZXh0cmFjdCBtYWl6ZSBnZW5lcyBmb3IgWk1NSUwsIFpNTUxSLCBhbmQgWk1NTVIgYW5kIHRlb3MgZ2VuZXMgZm9yIFpNUEJBTCwgWk1QSUwsIFpNUEpBDQpgYGB7cn0NCg0KbGlicmFyeShkcGx5cikNCm1haXplX2dlbmVzIDwtIGZpbHRlcihGYW5nX2dlbm90eXBlcywgR3JvdXA9PSJaTU1JTCIgfCBHcm91cD09IlpNTUxSIiB8IEdyb3VwPT0iWk1NUiIpDQoNCnRlb3NfZ2VuZXMgPC0gZmlsdGVyKEZhbmdfZ2Vub3R5cGVzLCBHcm91cD09IlpNUEJBTCIgfCBHcm91cD09IlpNUElMIiB8IEdyb3VwPT0iWk1QSkEiKQ0KDQoNCmRpbSh0ZW9zX2dlbmVzKQ0KZGltKG1haXplX2dlbmVzKQ0Kc25wX25hbWVzIDwtIGRhdGEuZnJhbWUoRmFuZ19nZW5vdHlwZXNbLDFdKQ0KY29sbmFtZXMoc25wX25hbWVzKSANCg0KDQoNCg0KDQoNCg0KDQoNCmhlYWRlcnMgPC0gZGF0YS5mcmFtZShUcmFuc3Bvc2VkX0dlbm90eXBlc1szLF0pDQpoZWFkZXJzDQpuYW1lcyhUcmFuc3Bvc2VkX0dlbm90eXBlcyk9dHJhbnNwb3NlZF9oZWFkZXJzDQpoZWFkKFRyYW5zcG9zZWRfR2Vub3R5cGVzKQ0KdChoZWFkZXJzKSAtPiB0cmFuc3Bvc2VkX2hlYWRlcnMNCnRyYW5zcG9zZWRfaGVhZGVycw0Kcm93bmFtZXMoVHJhbnNwb3NlZF9HZW5vdHlwZXMpPXRyYW5zcG9zZWRfaGVhZGVycw0KDQpNYWl6ZV9nZW5lc190cmFuc3Bvc2VkX2xpc3QgPC0gcm93bmFtZXMoVHJhbnNwb3NlZF9HZW5vdHlwZXM9PSJHcm91cCIpKV0NCk1haXplX2dlbmVzX3RyYW5zcG9zZWRfbGlzdA0KDQpgYGANCg0KYGBge3J9DQoNCmBgYA0KDQpOb3cgd2Ugd2lsbCBtYW5pcHVsYXRlIHRoZSB0d28gZmlsZXMgdG8gYGpvaW5gICh0aGVzZSBkYXRhIHNldHMgc28gdGhhdCB3ZSBoYXZlIGJvdGggZ2Vub3R5cGVzIGFuZCBwb3NpdGlvbnMgZm9ybWF0dGVkIHN1Y2ggdGhhdCB0aGUgZmlyc3QgY29sdW1uIGlzICJTTlBfSUQiLCB0aGUgc2Vjb25kIGNvbHVtbiBpcyAiQ2hyb21vc29tZSIsIHRoZSB0aGlyZCBjb2x1bW4gaXMgIlBvc2l0aW9uIiwgYW5kIHN1YnNlcXVlbnQgY29sdW1ucyBhcmUgZ2Vub3R5cGUgZGF0YSBmcm9tIGVpdGhlciBtYWl6ZSBvciB0ZW9zaW50ZSBpbmRpdmlkdWFscy4NCg0KRXh0cmFjdCBvdXQgdGhlIGdlbmVzIGZyb20gdGhlIGpvaW5lZCBmaWxlIHRoYXQgd2Ugd2FudCBmb3IgbWFpemUNCg0KYGBge3J9DQpNYWl6ZV9nZW5lc190b19leHRyYWN0MSA8LSBjKCdaTU1JTCcsJ1pNTUxSJywgJ1pNTU1SJykNCg0KYGBgDQpleHRyYWN0IHRlb3NpbnRlIGdlbmVzIGZvciBaTVBCQSwgWk1QSUwgLCBaTVBKQSAgDQpgYGB7cn0NCg0KaGVhZChqb2luZWRfZ2Vub3MpDQpUZW9zX2dlbmVzX3RvX2V4dHJhY3QyIDwtIGMoJ1pNUEJBJywnWk1QSUwnLCdaTVBKQScpICANCnRlb19nZW5lcyA8LSANCg0KDQpgYGANCg0KDQpgYGB7cn0NCmhlYWQoU25wbGlzdCkNCm5hbWVzKFNucGxpc3QpWzFdPC1wYXN0ZSgiU05QX0lEIikNCm5hbWVzKFNucGxpc3QpWzNdPC1wYXN0ZSgiQ2hyIikNCm5hbWVzKFNucGxpc3QpWzRdPC1wYXN0ZSgiUG9zaXRpb24iKQ0KU25wX2luZm8gPC0gZGF0YS5mcmFtZShTbnBsaXN0JFNOUF9JRCwgU25wbGlzdCRDaHIsIFNucGxpc3QkUG9zaXRpb24pDQpjb2xuYW1lcyhTbnBfaW5mbylbMV0gPSAiQ29sdW1uXzEiDQoNCmFzLmRhdGEuZnJhbWUodChGYW5nX2dlbm90eXBlcykpIC0+IFRyYW5zcG9zZWRfR2Vub3R5cGVzDQoNCnRlb19nZW5lcyA8LSBmaWx0ZXIoVHJhbnNwb3NlZF9HZW5vdHlwZXMsIEdyb3VwPT0iWk1QQkEiIHwgR3JvdXA9PSJaTVBJTCIgfCBHcm91cD09IlpNUEpBIikNCnJvd25hbWVzKFRyYW5zcG9zZWRfR2Vub3R5cGVzKVszXTwtICJHcm91cCINCg0KDQpHZW5vdHlwZV9zdWJzZXRfdGVvcyA8LSBzdWJzZXQoVHJhbnNwb3NlZF9HZW5vdHlwZXMsIGdyZXAoVGVvc19nZW5lc190b19leHRyYWN0MikpDQoNCg0KY29sbmFtZXMoVHJhbnNwb3NlZF9HZW5vdHlwZXMpWzFdIDwtICJDb2x1bW5fMSINCm1lcmdlKFNucF9pbmZvLCBUcmFuc3Bvc2VkX0dlbm90eXBlcywgYnkgPSAiQ29sdW1uXzEiKSAtPiBqb2luZWRfZ2Vub3MNCmNvbG5hbWVzKGpvaW5lZF9nZW5vcylbMV0gPC0gIlNOUF9JRCINCmNvbG5hbWVzKGpvaW5lZF9nZW5vcylbMl0gPC0gIkNocm9tb3NvbWUiDQpjb2xuYW1lcyhqb2luZWRfZ2Vub3MpWzNdIDwtIA0KDQpBbGxfc2FtcGxlX0lEX05hbWVzIDwtIGRhdGEuZnJhbWUoRmFuZ19nZW5vdHlwZXMkVjEpDQoNCkFsbF9zYW1wbGVfSURfTmFtZXNfYWRkZWQgPC0gZGF0YS5tYXRyaXgoYygiU05QX0lEIiwgIkNocm9tb3NvbWUiLCAiQlBfUG9zaXRpb24iLCBBbGxfc2FtcGxlX0lEX05hbWVzKSkNCg0KYGBgDQoNCg0KDQoNCg0KDQoNCg==